Data Check Plans

This markdown will be utilized for data cleaning and diagnostic checks

  1. Import and clean data:
    1. top chunk and tidy_cannon function
  2. identify variables to transform
    1. (set_vars) and create create_vars
  3. look for irregular rows (check_irreg):
    1. hits that should be misses, misses that should be hits
    2. systematically weird variation (rts that are too long, too short)
  4. Create summary table and highlight problematic rows
    1. High percentage of no_resp
    2. low percentage of hits
  5. Check RTs and provide info on possible transforms
    1. histograms
    2. means
  6. Graphical representation of trials
    1. PE by hits/misses
    2. PE_angmu by hits/misses
    3. PE_raw by hits/misses
    4. distMean by hits/misses

Adaptive Learning Task (ADLT)

In the cannon task, participants are instructed to place a shield at any point around a circle (Nassar et al., 2019). This shield covers only a portion of the 360-degree circle. The goal of the task is to infer where a cannonball might hit along the circle based on the information gathered from previous cannonball strikes and place the shield at those locations. Before each block of the task, the generative structure of the block is fully explained to the participants via written instructions and practice trials of the task. During the practice trials, participants can see where the cannon is aimed and use that information to place their shield. The cannon could theoretically hit anywhere within a 10-degree range from the center of the cannon’s muzzle. After this practice phase, participants complete the primary experiment where the cannon is removed and they must infer the cannon’s aim. The pattern of the cannon’s aim can either (1) change slightly within a specified range of the circle, with “oddball” cannonball shots striking about 14 percent of the time anywhere along the circle, or (2) remain stationary on most trials, and re-position to a random angle on approximately 14 percent of trials and remain stationary at this new angle. Participants receive explicit instructions regarding the nature of each of these blocks following the instructional phase, meaning they know whether the upcoming block will be from an oddball or change-point distribution. Participants complete 240 experimental trials in each condition, divided into 60-trial blocks, for a total of 480 trials. On each trial, participants adjust the position of their shield and lock in their decision. After their choice has been made, there is a 500ms delay before the location of the cannonball is revealed for 500ms. This provides an explicit representation of how far away the center of the shield fell from the cannonball’s strike, providing visual feedback about their PE from the cannonball’s actual location (Figure 2a, ii). After participants are shown how far off they were, the outcome of the task is revealed (Figure 2a, iii). After another brief delay (1000ms), the size of the participant’s shield is revealed alongside the cannonball, showing whether they successfully blocked the projectile. While the shield is always centered on the participant’s chosen location, it varies in size such that participants are never totally certain they will block the cannonball. Thus, minimizing the PE in the feedback phase is the optimal strategy for performing well in the task. At the end of a block, participants are provided feedback in terms of how many cannonballs they caught as a percentage of the total cannonballs.

Pilot Data

TL;DR for Data Cleaning

This dataframe resulting from this cleaning is an example dataset. Because the cleaning code will be updated, the current cleaning code does not need to be reviewed unless you suspect anaomalies in the resulting dataframe. For any questions related to the definition of task variables, please refer to: https://docs.google.com/spreadsheets/d/1V2UD28C_zAfH90BnNO3si6c0-qDUDFbHoU44rfdEt4I/edit#gid=1309555968. Skip to heading Pilot Results for information on pilot data.

It should also be noted that data for this pilot came from two, slightly different versions of the ADLT. The first version failed to record trial latency. Thus, latencies were only examined for a subset of participants.

Source Directories

####Setup - Choose options

print_plots <- "cp" # could be "oddball" 
get_model_params <- TRUE
pilot <- FALSE #if pilot data, will read August Pilot Data
  1. Load Pilot Data…
#load data

if (pilot == FALSE){
  cannon_data_raw <- data.table::fread("~/github_repos/Cannon_Task_Inquisit/Data/PUBS_Batch1_C.csv", fill = TRUE) %>% 
  dplyr::filter(subject != 1) %>% 
  dplyr::filter(subject != "subject") 
} else{
  cannon_data_raw <- data.table::fread("~/github_repos/Cannon_Task_Inquisit/Data/PUBS_Batch1_C_Samp.csv", fill = TRUE) %>% 
  dplyr::filter(subject == 1) %>% 
  dplyr::filter(subject != "subject") 
}

#FILTER OUT TIMES BEFORE OFFICIAL DATA COLLECTION
load("cannon_processing.Rdata")
#load("Cannonball_Pilot_Cleaned_Data.Rdata")
iq_names <- colnames(cannon_data_raw)

# #If it's a pilot without numbers, do this
# if (length(unique(cannon_data_raw$subject == 1))){
#   cannon_data_raw <- cannon_data_raw %>% group_by(time) %>% mutate(subject= cur_group_id())
#   unique(cannon_data_raw$subject)
# }

1. Clean Pilot Data…

It is important to get rid of excess trials and only retain information that will be important in analysis.

if (length(Raw) == length(Chopped)){
  print("Data Successfully Chopped")
} else {
  print("Double-Check Prep")
  #Batch1: 815602 did not complete task -fine to be off in length
}
## [1] "Double-Check Prep"

2. Transform Pilot Data…

This chunk sets new variables and alters any old ones that need to be tidied. The variables created by the set_vars vector are:

predErr. Absolute prediction error (calculated using the discrep function): the PE from outcome to shield placement distMean. Prediction error from mean of distribution to shield placement. Calculated with discrep function. catch_miss. Codes values as factors “catch”, “miss”, “noresp”. Used for coding and mean comparison changepoint A column that reflects the trial number of changepoints. contains trial number when changepoint occurs and NA when no changepoint has occured.

Table 1. High-Level Task Performance. This check returns block-level info about subject’s performance. Participants are flagged if they did not respond to more than half of trials.

set_vars <- c("predErr", "distMean", "catch_miss", "changepoint", "perf", "total_trialnum" ) ##name variables needed for analysis, add as they become apparent 
trim_cols <- TRUE
cannon_data <- create_vars(cannon_data, set_vars) ## Intialize rows to calculate
cannon_data <- cannon_data %>% ungroup() %>% group_by(subject) %>% mutate(cBal = ifelse(cond[1] == "CHANGEPOINT", 2, 1))

 if (trim_cols == TRUE){
   if (cannon_data$percentTrialsStay == .86){
      drop <- c("percentTrialsStay","blockcode","trialcode",
                "block.InstructionBlock.timestamp","trial.begin_block.timestamp",
              "trial.mainloop.timestamp","trial.placeshield_mouse.timestamp","trial.showPE.timestamp"
              ,"trial.cannon_outcome.timestamp","picture.shield.currentitem")
   }
    drop <- c("blockcode","trialcode",
              "block.InstructionBlock.timestamp","trial.begin_block.timestamp",
              "trial.mainloop.timestamp","trial.placeshield_mouse.timestamp"
              , "trial.showPE.timestamp"
              ,"trial.cannon_outcome.timestamp","picture.shield.currentitem")
    cannon_data <- cannon_data[,!(names(cannon_data) %in% drop)]
 }

cleaned_names <- colnames(cannon_data)
save(model_names, iq_names, cleaned_names, file = "cannon_processing.Rdata")

3. Checks for other idiosyncracies?

It’s probably worth writing this into a function as well. I should probably do this before I implement high-level checks. Ideas for things I might need to check are: Miscodes. Misses that are actually hits, Hits that are actually misses, etc. Odd numbers of trials Participants bailing early on the task, getting stuck somewhere, etc.

In this pilot data, we are still getting occasional hits and misses that don’t make sense. I think this is due to

cannon_data <- check_irreg(cannon_data) #check for any mistaken hits
irreg_plot_hit <- cannon_data %>% filter(grepl("CHECK_HIT", Irreg)) %>% select(subject, cond, blocknum, trialnum, catch_miss, angmu, placementAngle, outcome, predErr, angleup, angledown, shieldsize)
irreg_plot_miss <- cannon_data %>% filter(grepl("CHECK_MISS", Irreg)) %>% select(subject, cond, blocknum, trialnum, catch_miss, trial.placeshield_mouse.latency, angmu, placementAngle, prev_placementAngle, outcome, predErr, picture.shield.stimulusonset, angleup, angledown, shield_size, shieldsize) %>% filter(!(placementAngle == prev_placementAngle)
                                                                                                                                                                                                                                                                                                        )
irreg_plot_NA <- cannon_data %>% filter(grepl("CHECK_NA", Irreg)) %>% select(subject, cond, blocknum, trialnum, catch_miss, angmu, placementAngle, outcome, predErr, angleup, angledown, shieldsize)
range <- cannon_data %>% 
  group_by(subject, time) %>% summarise(placement_angle_min = min(cannon_data$placementAngle %>% na.omit),
                                   placement_angle_max = max(cannon_data$placementAngle %>% na.omit),
                               outcome_min = min(cannon_data$outcome %>% na.omit),
                               outcome_max = max(cannon_data$outcome %>% na.omit),
                               angleup_min = min(cannon_data$angleup %>% na.omit),
                               angleup_max = max(cannon_data$angleup %>% na.omit),
                               angledown_min = min(cannon_data$angledown %>% na.omit),
                               angledown_max = max(cannon_data$angledown %>% na.omit))

if(nrow(irreg_plot_hit) > 0){
  irreg_plot_hit
}
if(nrow(irreg_plot_miss) > 0){
  irreg_plot_miss
}
## # A tibble: 25 x 16
## # Groups:   subject [18]
##    subject cond      blocknum trialnum catch_miss trial.placeshield_mouse… angmu
##      <dbl> <fct>        <dbl>    <dbl> <chr>                         <dbl> <dbl>
##  1  117568 ODDBALL          1       10 miss                            284 111. 
##  2  117568 CHANGEPO…        2       50 miss                            645 300. 
##  3  117568 CHANGEPO…        4       30 miss                           2152  28.6
##  4  127306 CHANGEPO…        4        7 miss                            380  95.9
##  5  286390 ODDBALL          3       33 miss                            300 257. 
##  6  326980 ODDBALL          1       10 miss                            634 291. 
##  7  390664 CHANGEPO…        1       22 miss                            149 216. 
##  8  390664 CHANGEPO…        3       28 miss                            375 262. 
##  9  545599 ODDBALL          3       10 miss                           1940 277. 
## 10  642086 ODDBALL          2       54 miss                            231 310. 
## # … with 15 more rows, and 9 more variables: placementAngle <dbl>,
## #   prev_placementAngle <dbl>, outcome <dbl>, predErr <dbl>,
## #   picture.shield.stimulusonset <dbl>, angleup <dbl>, angledown <dbl>,
## #   shield_size <dbl>, shieldsize <dbl>
if(nrow(irreg_plot_NA) > 0){
  irreg_plot_NA
}

4. Create Table of Summary statistics

obj <- cannon_data %>% group_by(subject, cond, blocknum, totalearnings) %>% 
  summarise(obscount = n(), avg_PE = mean(predErr, na.rm = TRUE),
            dist_mean = mean(distMean,na.rm = TRUE),
            percent_caught = max(cannonballs_caught/max(trialnum)),
            percent_noresp = (sum(outcomeindex == 1)/max(trialnum)),
            num_changepoints = sum(!is.na(changepoint))) %>%
  arrange(subject,totalearnings) %>% group_by(subject) %>% mutate(taskearnings = max(totalearnings)) %>% mutate(Avg_overall_PE = mean(avg_PE, na.rm = TRUE)) %>% ungroup()
#save(obj, cannon_data, file = "Cannonball_Pilot_Example_Perfect_Data.Rdata")

colormatrix <- ifelse(obj$percent_noresp >= .2 || obj$obscount != 60, wes_palette("Cavalcanti1")[c(1)], "white") ##potentially save these as bad_blocks vector
bad_blocks <- obj %>% dplyr::filter( percent_noresp >= .2)

tab <- obj %>% flextable() %>% flextable::bg(j = 1:ncol(obj), bg=colormatrix)

cannon_earnings <- cannon_data %>% group_by(subject) %>% summarize(cents_earned = max(totalearnings)) %>% dplyr::rename(centsearned_c = cents_earned)
save(cannon_earnings, file = "~/github_repos/PUBS_Data_Verification/Payment/Cannon.Rdata")
tab

5. Reaction time data

Figure 1. Below is a histogram of all reaction time data.This is beautiful!Tried both with and without subject 456, who seems a little problematic based on the summary statistics. Overall, happy with this!

Hist_outliers <- ggplot(cannon_data %>% filter(!subject == 456), aes(x=trial.placeshield_mouse.latency)) + geom_histogram()
Mean_cleaned <- cannon_data %>% filter(!subject == 456) %>% ungroup () %>% filter(!trial.placeshield_mouse.latency == 2500) %>% summarise(mean = mean(trial.placeshield_mouse.latency)) 
Hist_outliers; Mean_cleaned

## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  430.

6. Graphical Respresentations by subject

I’m having some trouble with graphing this. For whatever reason, the oddball graphs are not happy with the tabset format. Everything else seems to be playing nicely, and it seems as though participants are learning across the task.

cowplotcp <- list();cowplotodd <- list()
subjects <- sort(unique(cannon_data$subject))
subjects <- as.character(subjects)

subjects_list <- list(as.character(unique(cannon_data$subject)))
for (i in subjects) {
  o <- cannon_data %>% dplyr::filter(subject == i & cond == "ODDBALL") %>% filter(!is.na(predErr))
 cowplotodd[[i]] <- local({
        i <- i
         oddballPE <- ggplot(o, aes(x=trialnum, y=predErr, color = catch_miss)) + geom_point() +
     geom_vline(aes(xintercept = changepoint, color = "Oddball")) + scale_color_manual(values = (wes_palette("Cavalcanti1")[c(4,5,2)])) + xlab("Trial Number") + ylab("Response Type") + ggtitle("Raw Prediction Error") + ylab("Size Of Prediction Error") + theme(legend.position = "none") + facet_wrap(~blocknum, ncol = 1)

get_legendo <- ggplot(o, aes(x=trialnum, y=distMean, color = catch_miss)) + geom_point() +
     geom_vline(aes(xintercept = changepoint, color = "Oddball")) + scale_color_manual(values = (wes_palette("Cavalcanti1")[c(4,5,2)]))
 legendo <- get_legend(get_legendo)
 
 oddballDM <- ggplot(o, aes(x=trialnum, y=distMean, color = catch_miss)) + geom_point() +
     geom_vline(aes(xintercept = changepoint, color = "Oddball")) + scale_color_manual(values = (wes_palette("Cavalcanti1")[c(4,5,2)]))  + ggtitle ("Distance from Mean") + xlab("Trial Number") + ylab("Size Of Prediction Error") + theme(axis.title.y=element_blank(), axis.text.y=element_blank(), axis.ticks.y=element_blank())  + theme(legend.position = "none") + facet_wrap(~blocknum, ncol = 1)

i <- cowplot::plot_grid(oddballPE, oddballDM, legendo, ncol = 3, rel_widths = c(2.5,2.1,1))})
}
for (i in subjects){
  c <- cannon_data %>% dplyr::filter(subject == i & cond == "CHANGEPOINT") %>% filter(!is.na(predErr))
  cowplotcp[[i]] <- local({
        i <- i
        changepointPE <- ggplot(c,aes(x=trialnum, y=predErr, color = catch_miss)) + geom_point() + geom_vline(aes(xintercept = changepoint, color = "Changepoint")) + scale_color_manual(values = (wes_palette("Cavalcanti1")[c(2,4,5)])) + xlab("Trial Number") + ylab("Response Type") + ggtitle("Raw Prediction Error") + ylab("Size Of Prediction Error") + theme(legend.position = "non7e") + facet_wrap(~blocknum, ncol = 1)
 
get_legendc <- ggplot(c, aes(x=trialnum, y=distMean, color = catch_miss)) + geom_point() + geom_vline(aes(xintercept = changepoint, color = "Changepoint")) + scale_color_manual(values = (wes_palette("Cavalcanti1")[c(4,5,2)]))
 legendc <- get_legend(get_legendc)
 #c(5,4,2), 2,5,4, 4,5,2, 2,4,5
 changepointDM <- ggplot(c, aes(x=trialnum, y=distMean, color = catch_miss)) + geom_point() + 
   geom_vline(aes(xintercept = changepoint, color = "Changepoint")) + scale_color_manual(values = (wes_palette("Cavalcanti1")[c(4,5,2)])) + ggtitle ("Distance from Mean") + xlab("Trial Number") + ylab("Size Of Prediction Error") + theme(axis.title.y=element_blank(), axis.text.y=element_blank(), axis.ticks.y=element_blank())  + theme(legend.position = "none") + facet_wrap(~blocknum, ncol = 1)
 
i <- cowplot::plot_grid(changepointPE, changepointDM, legendc, ncol = 3, rel_widths = c(2.5,2.1,1))})
}
##print based on the block that came first (use cBal)
# 
# 
# cowplotall <- list()
# for (i in subjects) {
#   cowplotall[[i]] <- local({
#         i <- i
#         cowplot::plot_grid(cowplotcp[[i]], cowplotodd[[i]], nrow = 2)})
# }

Graphing of Block

if(print_plots == "cp") {
  temp_cp <- c(
    "### Subject {{nm}}\n",
    "```{r, echo = FALSE}\n",
    "cowplotcp[[{{nm}}]] \n",
    "```\n",
    "\n"
  )

plots <- lapply(1:length(cowplotcp), function(nm) {knitr::knit_expand(text = temp_cp)})
} else if (print_plots == "ob"){

template <- c(
    "### Subject {{w}}\n",
    "```{r, echo = FALSE}\n",
    "cowplotodd[[{{w}}]] \n",
    "```\n",
    "\n"
    
  )
plots <- lapply(1:length(cowplotodd), function(w) {knitr::knit_expand(text = template)})
}

Subject 1

Subject 2

Subject 3

Subject 4

Subject 5

Subject 6

Subject 7

Subject 8

Subject 9

Subject 10

Subject 11

Subject 12

Subject 13

Subject 14

Subject 15

Subject 16

Subject 17

Subject 18

Subject 19

Subject 20

Subject 21

Subject 22

Subject 23

Subject 24

Subject 25

Subject 26

Subject 27

Subject 28

Subject 29

Subject 30

Subject 31

Subject 32

Subject 33

Subject 34

Subject 35

Subject 36

Subject 37

Subject 38

Subject 39

Subject 40

Subject 41

a_1 <- aov(trial.placeshield_mouse.latency ~ cond, data = cannon_data)
summary(a_1)
##                Df    Sum Sq Mean Sq F value  Pr(>F)    
## cond            1 4.732e+06 4732209   36.85 1.3e-09 ***
## Residuals   19196 2.465e+09  128416                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 87 observations deleted due to missingness
rt_trialnum <- lm(trial.placeshield_mouse.latency ~ trialnum + cond  + blocknum + cBal, data = cannon_data)
summary(rt_trialnum)
## 
## Call:
## lm(formula = trial.placeshield_mouse.latency ~ trialnum + cond + 
##     blocknum + cBal, data = cannon_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -561.62 -229.94  -89.02  117.28 2141.80 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 664.6259    11.0956  59.900  < 2e-16 ***
## trialnum     -1.5626     0.1476 -10.588  < 2e-16 ***
## condODDBALL -32.9172     5.1146  -6.436 1.26e-10 ***
## blocknum    -31.6437     2.2863 -13.840  < 2e-16 ***
## cBal        -62.6753     5.1290 -12.220  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 354.2 on 19193 degrees of freedom
##   (87 observations deleted due to missingness)
## Multiple R-squared:  0.02494,    Adjusted R-squared:  0.02474 
## F-statistic: 122.7 on 4 and 19193 DF,  p-value: < 2.2e-16
emmeans(rt_trialnum, "cond")
##  cond        emmean   SE    df lower.CL upper.CL
##  CHANGEPOINT    444 3.62 19193      437      451
##  ODDBALL        411 3.62 19193      404      418
## 
## Results are averaged over the levels of: cBal 
## Confidence level used: 0.95
#emmeans(rt_trialnum, as.character("blocknum"))
#maybe give this an option? 

save(cannon_data, file = "~/github_repos/Cannon_Task_Inquisit/Data/cannon_proc.RData")
 # if (modeling_ext == TRUE){
 #   tidy_for_model(cannon_data)
 # }

Issues Warranting Discussion

For Michael Discussion 9/27